Descriptor Selection via Log-Sum Regularization for the Biological Activities of Chemical Structure
نویسندگان
چکیده
The quantitative structure-activity relationship (QSAR) model searches for a reliable relationship between the chemical structure and biological activities in the field of drug design and discovery. (1) Background: In the study of QSAR, the chemical structures of compounds are encoded by a substantial number of descriptors. Some redundant, noisy and irrelevant descriptors result in a side-effect for the QSAR model. Meanwhile, too many descriptors can result in overfitting or low correlation between chemical structure and biological bioactivity. (2) Methods: We use novel log-sum regularization to select quite a few descriptors that are relevant to biological activities. In addition, a coordinate descent algorithm, which uses novel univariate log-sum thresholding for updating the estimated coefficients, has been developed for the QSAR model. (3) Results: Experimental results on artificial and four QSAR datasets demonstrate that our proposed log-sum method has good performance among state-of-the-art methods. (4) Conclusions: Our proposed multiple linear regression with log-sum penalty is an effective technique for both descriptor selection and prediction of biological activity.
منابع مشابه
A Priori Prediction of Tissue: Plasma Partition Coefficients (Log BP) of Drugs to Facilitate the Use of MLR and MLR-GA Methods
It is important to determine whether a candidate molecule is capable of penetrating the plasma-brain barrier indrug discovery and development. The aim of this paper is to establish a predictive model for plasma-brainbarrier penetration using simple descriptors The usefulness of the quantum chemical descriptors, calculated atthe level of the DFT and HE theories using 6-310* basis set for QSAR st...
متن کاملA novel topological descriptor based on the expanded wiener index: Applications to QSPR/QSAR studies
In this paper, a novel topological index, named M-index, is introduced based on expanded form of the Wiener matrix. For constructing this index the atomic characteristics and the interaction of the vertices in a molecule are taken into account. The usefulness of the M-index is demonstrated by several QSPR/QSAR models for different physico-chemical properties and biological activities of a large...
متن کاملQuantitative structure-activity relationship (QSAR) study of CCR2b receptor inhibitors using SW-MLR and GA-MLR approaches
In this paper, the quantitative structure activity-relationship (QSAR) of the CCR2b receptor inhibitors was scrutinized. Firstly, the molecular descriptors were calculated using the Dragon package. Then, the stepwise multiple linear regressions (SW-MLR) and the genetic algorithm multiple linear regressions (GA-MLR) variable selection methods were subsequently employed to select and implement th...
متن کاملOn the Conditions of Sparse Parameter Estimation via Log-Sum Penalty Regularization
For high-dimensional sparse parameter estimation problems, Log-Sum Penalty (LSP) regularization effectively reduces the sampling sizes in practice. However, it still lacks theoretical analysis to support the experience from previous empirical study. The analysis of this article shows that, like `0-regularization, O(s) sampling size is enough for proper LSP, where s is the non-zero components of...
متن کاملA New Approach for Determination of Neck-Pore Size Distribution of Porous Membranes via Bubble Point Data
Reliable estimation of the porous membranes neck-pore size distribution (NPSD) is the key element in the design and operation of all membrane separation processes. In this paper, a new approach is presented for reliable of NPSD of porous membranes using wet flow-state bubble point test data. For this purpose, a robust method based on the linear regularization theory is developed to extract NPSD...
متن کامل